Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 19 de 19
Filter
Add more filters










Publication year range
1.
Proteomics ; 24(9): e2300257, 2024 May.
Article in English | MEDLINE | ID: mdl-38263811

ABSTRACT

With the notable surge in therapeutic peptide development, various peptides have emerged as potential agents against virus-induced diseases. Viral entry inhibitory peptides (VEIPs), a subset of antiviral peptides (AVPs), offer a promising avenue as entry inhibitors (EIs) with distinct advantages over chemical counterparts. Despite this, a comprehensive analytical platform for characterizing these peptides and their effectiveness in blocking viral entry remains lacking. In this study, we introduce a groundbreaking in silico approach that leverages bioinformatics analysis and machine learning to characterize and identify novel VEIPs. Cross-validation results demonstrate the efficacy of a model combining sequence-based features in predicting VEIPs with high accuracy, validated through independent testing. Additionally, an EI type model has been developed to distinguish peptides specifically acting as Eis from AVPs with alternative activities. Notably, we present iDVEIP, a web-based tool accessible at http://mer.hc.mmh.org.tw/iDVEIP/, designed for automatic analysis and prediction of VEIPs. Emphasizing its capabilities, the tool facilitates comprehensive analyses of peptide characteristics, providing detailed amino acid composition data for each prediction. Furthermore, we showcase the tool's utility in identifying EIs against severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2).


Subject(s)
Antiviral Agents , Computational Biology , Machine Learning , Peptides , SARS-CoV-2 , Virus Internalization , Virus Internalization/drug effects , Antiviral Agents/pharmacology , Antiviral Agents/chemistry , Humans , Peptides/chemistry , Peptides/pharmacology , Computational Biology/methods , SARS-CoV-2/drug effects , COVID-19 Drug Treatment , Computer Simulation , COVID-19/virology , Software
2.
Taiwan J Obstet Gynecol ; 62(5): 687-696, 2023 Sep.
Article in English | MEDLINE | ID: mdl-37678996

ABSTRACT

OBJECTIVE: With the rising number of cases of non-vaginal delivery worldwide, scientists have been concerned about the influence of the different delivery modes on maternal and neonatal microbiomes. Although the birth rate trend is decreasing rapidly in Taiwan, more than 30 percent of newborns are delivered by caesarean section every year. However, it remains unclear whether the different delivery modes could have a certain impact on the postpartum maternal microbiome and whether it affects the mother-to-newborn vertical transmission of bacteria at birth. MATERIALS AND METHODS: To address this, we recruited 30 mother-newborn pairs to participate in this study, including 23 pairs of vaginal delivery (VD) and seven pairs of caesarean section (CS). We here investigate the development of the maternal prenatal and postnatal microbiomes across multiple body habitats. Moreover, we also explore the early acquisition of neonatal gut microbiome through a vertical multi-body site microbiome analysis. RESULTS AND CONCLUSION: The results indicate that no matter the delivery mode, it only slightly affects the maternal microbiome in multiple body habitats from pregnancy to postpartum. On the other hand, about 95% of species in the meconium microbiome were derived from one of the maternal body habitats; notably, the infants born by caesarean section acquire bacterial communities resembling their mother's oral microbiome. Consequently, the delivery modes play a crucial role in the initial colonization of the neonatal gut microbiome, potentially impacting children's health and development.


Subject(s)
Cesarean Section , Microbiota , Infant, Newborn , Pregnancy , Child , Infant , Humans , Female , RNA, Ribosomal, 16S/genetics , Genes, rRNA , Microbiota/genetics , Delivery, Obstetric
3.
Int J Med Sci ; 19(14): 2008-2021, 2022.
Article in English | MEDLINE | ID: mdl-36483599

ABSTRACT

Endometrial cancer is one of the most common malignancy affecting women in developed countries. Resection uterus or lesion area is usually the first option for a simple and efficient therapy. Therefore, it is necessary to find a new therapeutic drug to reduce surgery areas to preserve fertility. Anticancer peptides (ACP) are bioactive amino acids with lower toxicity and higher specificity than chemical drugs. This study is to address an ACP, herein named Q7, which could downregulate 24-Dehydrocholesterol Reductase (DHCR24) to disrupt lipid rafts formation, and sequentially affect the AKT signal pathway of HEC-1-A cells to suppress their tumorigenicity such as proliferation and migration. Moreover, lipo-PEI-PEG-complex (LPPC) was used to enhance Q7 anticancer activity in vitro and efficiently show its effects on HEC-1-A cells. Furthermore, LPPC-Q7 exhibited a synergistic effect in combination with doxorubicin or paclitaxel. To summarize, Q7 was firstly proved to exhibit an anticancer effect on endometrial cancer cells and combined with LPPC efficiently improved the cytotoxicity of Q7.


Subject(s)
Endometrial Neoplasms , Oxidoreductases Acting on CH-CH Group Donors , Humans , Female , Endometrial Neoplasms/drug therapy , Endometrial Neoplasms/genetics , Peptides/pharmacology , Peptides/therapeutic use , Nerve Tissue Proteins
4.
Brief Bioinform ; 23(6)2022 11 19.
Article in English | MEDLINE | ID: mdl-36215051

ABSTRACT

Antiretroviral peptides are a kind of bioactive peptides that present inhibitory activity against retroviruses through various mechanisms. Among them, viral integrase inhibitory peptides (VINIPs) are a class of antiretroviral peptides that have the ability to block the action of integrase proteins, which is essential for retroviral replication. As the number of experimentally verified bioactive peptides has increased significantly, the lack of in silico machine learning approaches can effectively predict the peptides with the integrase inhibitory activity. Here, we have developed the first prediction model for identifying the novel VINIPs using the sequence characteristics, and the hybrid feature set was considered to improve the predictive ability. The performance was evaluated by 5-fold cross-validation based on the training dataset, and the result indicates the proposed model is capable of predicting the VINIPs, with a sensitivity of 85.82%, a specificity of 88.81%, an accuracy of 88.37%, a balanced accuracy of 87.32% and a Matthews correlation coefficient value of 0.64. Most importantly, the model also consistently provides effective performance in independent testing. To sum up, we propose the first computational approach for identifying and characterizing the VINIPs, which can be considered novel antiretroviral therapy agents. Ultimately, to facilitate further research and development, iDVIP, an automatic computational tool that predicts the VINIPs has been developed, which is now freely available at http://mer.hc.mmh.org.tw/iDVIP/.


Subject(s)
HIV Infections , Integrases , Humans , Amino Acid Sequence , Peptides/pharmacology , Peptides/chemistry , Proteins/chemistry
5.
Sci Rep ; 11(1): 13594, 2021 06 30.
Article in English | MEDLINE | ID: mdl-34193950

ABSTRACT

Anticancer peptides (ACPs) are a kind of bioactive peptides which could be used as a novel type of anticancer drug that has several advantages over chemistry-based drug, including high specificity, strong tumor penetration capacity, and low toxicity to normal cells. As the number of experimentally verified bioactive peptides has increased significantly, various of in silico approaches are imperative for investigating the characteristics of ACPs. However, the lack of methods for investigating the differences in physicochemical properties of ACPs. In this study, we compared the N- and C-terminal amino acid composition for each peptide, there are three major subtypes of ACPs that are defined based on the distribution of positively charged residues. For the first time, we were motivated to develop a two-step machine learning model for identification of the subtypes of ACPs, which classify the input data into the corresponding group before applying the classifier. Further, to improve the predictive power, the hybrid feature sets were considered for prediction. Evaluation by five-fold cross-validation showed that the two-step model trained with sequence-based features and physicochemical properties was most effective in discriminating between ACPs and non-ACPs. The two-step model trained with the hybrid features performed well, with a sensitivity of 86.75%, a specificity of 85.75%, an accuracy of 86.08%, and a Matthews Correlation Coefficient value of 0.703. Furthermore, the model also consistently provides the effective performance in independent testing set, with sensitivity of 77.6%, specificity of 94.74%, accuracy of 88.99% and the MCC value reached 0.75. Finally, the two-step model has been implemented as a web-based tool, namely iDACP, which is now freely available at http://mer.hc.mmh.org.tw/iDACP/ .


Subject(s)
Amino Acid Sequence , Antineoplastic Agents/chemistry , Computational Biology , Machine Learning , Peptides , Humans , Peptides/chemistry , Peptides/genetics
6.
BMC Bioinformatics ; 21(1): 568, 2020 Dec 09.
Article in English | MEDLINE | ID: mdl-33297954

ABSTRACT

BACKGROUND: Protein phosphoglycerylation, the addition of a 1,3-bisphosphoglyceric acid (1,3-BPG) to a lysine residue of a protein and thus to form a 3-phosphoglyceryl-lysine, is a reversible and non-enzymatic post-translational modification (PTM) and plays a regulatory role in glucose metabolism and glycolytic process. As the number of experimentally verified phosphoglycerylated sites has increased significantly, statistical or machine learning methods are imperative for investigating the characteristics of phosphoglycerylation sites. Currently, research into phosphoglycerylation is very limited, and only a few resources are available for the computational identification of phosphoglycerylation sites. RESULT: We present a bioinformatics investigation of phosphoglycerylation sites based on sequence-based features. The TwoSampleLogo analysis reveals that the regions surrounding the phosphoglycerylation sites contain a high relatively of positively charged amino acids, especially in the upstream flanking region. Additionally, the non-polar and aliphatic amino acids are more abundant surrounding phosphoglycerylated lysine following the results of PTM-Logo, which may play a functional role in discriminating between phosphoglycerylation and non-phosphoglycerylation sites. Many types of features were adopted to build the prediction model on the training dataset, including amino acid composition, amino acid pair composition, positional weighted matrix and position-specific scoring matrix. Further, to improve the predictive power, numerous top features ranked by F-score were considered as the final combination for classification, and thus the predictive models were trained using DT, RF and SVM classifiers. Evaluation by five-fold cross-validation showed that the selected features was most effective in discriminating between phosphoglycerylated and non-phosphoglycerylated sites. CONCLUSION: The SVM model trained with the selected sequence-based features performed well, with a sensitivity of 77.5%, a specificity of 73.6%, an accuracy of 74.9%, and a Matthews Correlation Coefficient value of 0.49. Furthermore, the model also consistently provides the effective performance in independent testing set, yielding sensitivity of 75.7% and specificity of 64.9%. Finally, the model has been implemented as a web-based system, namely iDPGK, which is now freely available at http://mer.hc.mmh.org.tw/iDPGK/ .


Subject(s)
Computational Biology/methods , Lysine/metabolism , Software , Amino Acid Sequence , Glycosylation , Internet , Lysine/chemistry , Machine Learning , Position-Specific Scoring Matrices , Protein Processing, Post-Translational , Proteins/chemistry , ROC Curve , Reproducibility of Results , Support Vector Machine
7.
Genomics Proteomics Bioinformatics ; 18(2): 208-219, 2020 04.
Article in English | MEDLINE | ID: mdl-32592791

ABSTRACT

Protein succinylation is a biochemical reaction in which a succinyl group (-CO-CH2-CH2-CO-) is attached to the lysine residue of a protein molecule. Lysine succinylation plays important regulatory roles in living cells. However, studies in this field are limited by the difficulty in experimentally identifying the substrate site specificity of lysine succinylation. To facilitate this process, several tools have been proposed for the computational identification of succinylated lysine sites. In this study, we developed an approach to investigate the substrate specificity of lysine succinylated sites based on amino acid composition. Using experimentally verified lysine succinylated sites collected from public resources, the significant differences in position-specific amino acid composition between succinylated and non-succinylated sites were represented using the Two Sample Logo program. These findings enabled the adoption of an effective machine learning method, support vector machine, to train a predictive model with not only the amino acid composition, but also the composition of k-spaced amino acid pairs. After the selection of the best model using a ten-fold cross-validation approach, the selected model significantly outperformed existing tools based on an independent dataset manually extracted from published research articles. Finally, the selected model was used to develop a web-based tool, SuccSite, to aid the study of protein succinylation. Two proteins were used as case studies on the website to demonstrate the effective prediction of succinylation sites. We will regularly update SuccSite by integrating more experimental datasets. SuccSite is freely accessible at http://csb.cse.yzu.edu.tw/SuccSite/.


Subject(s)
Amino Acids/metabolism , Succinic Acid/metabolism , Amino Acid Sequence , Databases, Protein , Dipeptides/metabolism , Humans , Lysine/metabolism , Machine Learning , Proteins/chemistry , Proteins/metabolism , ROC Curve , Substrate Specificity , Support Vector Machine
8.
BMC Bioinformatics ; 19(Suppl 13): 384, 2019 Feb 04.
Article in English | MEDLINE | ID: mdl-30717647

ABSTRACT

BACKGROUND: Glutarylation, the addition of a glutaryl group (five carbons) to a lysine residue of a protein molecule, is an important post-translational modification and plays a regulatory role in a variety of physiological and biological processes. As the number of experimentally identified glutarylated peptides increases, it becomes imperative to investigate substrate motifs to enhance the study of protein glutarylation. We carried out a bioinformatics investigation of glutarylation sites based on amino acid composition using a public database containing information on 430 non-homologous glutarylation sites. RESULTS: The TwoSampleLogo analysis indicates that positively charged and polar amino acids surrounding glutarylated sites may be associated with the specificity in substrate site of protein glutarylation. Additionally, the chi-squared test was utilized to explore the intrinsic interdependence between two positions around glutarylation sites. Further, maximal dependence decomposition (MDD), which consists of partitioning a large-scale dataset into subgroups with statistically significant amino acid conservation, was used to capture motif signatures of glutarylation sites. We considered single features, such as amino acid composition (AAC), amino acid pair composition (AAPC), and composition of k-spaced amino acid pairs (CKSAAP), as well as the effectiveness of incorporating MDD-identified substrate motifs into an integrated prediction model. Evaluation by five-fold cross-validation showed that AAC was most effective in discriminating between glutarylation and non-glutarylation sites, according to support vector machine (SVM). CONCLUSIONS: The SVM model integrating MDD-identified substrate motifs performed well, with a sensitivity of 0.677, a specificity of 0.619, an accuracy of 0.638, and a Matthews Correlation Coefficient (MCC) value of 0.28. Using an independent testing dataset (46 glutarylated and 92 non-glutarylated sites) obtained from the literature, we demonstrated that the integrated SVM model could improve the predictive performance effectively, yielding a balanced sensitivity and specificity of 0.652 and 0.739, respectively. This integrated SVM model has been implemented as a web-based system (MDDGlutar), which is now freely available at http://csb.cse.yzu.edu.tw/MDDGlutar/ .


Subject(s)
Computational Biology/methods , Glutarates/metabolism , Lysine/metabolism , Amino Acid Motifs , Amino Acid Sequence , Animals , Databases, Protein , Lysine/chemistry , Mice , Proteins/chemistry , ROC Curve , Reproducibility of Results , Substrate Specificity , Support Vector Machine , User-Computer Interface
9.
Nucleic Acids Res ; 47(D1): D298-D308, 2019 01 08.
Article in English | MEDLINE | ID: mdl-30418626

ABSTRACT

The dbPTM (http://dbPTM.mbc.nctu.edu.tw/) has been maintained for over 10 years with the aim to provide functional and structural analyses for post-translational modifications (PTMs). In this update, dbPTM not only integrates more experimentally validated PTMs from available databases and through manual curation of literature but also provides PTM-disease associations based on non-synonymous single nucleotide polymorphisms (nsSNPs). The high-throughput deep sequencing technology has led to a surge in the data generated through analysis of association between SNPs and diseases, both in terms of growth amount and scope. This update thus integrated disease-associated nsSNPs from dbSNP based on genome-wide association studies. The PTM substrate sites located at a specified distance in terms of the amino acids encoded from nsSNPs were deemed to have an association with the involved diseases. In recent years, increasing evidence for crosstalk between PTMs has been reported. Although mass spectrometry-based proteomics has substantially improved our knowledge about substrate site specificity of single PTMs, the fact that the crosstalk of combinatorial PTMs may act in concert with the regulation of protein function and activity is neglected. Because of the relatively limited information about concurrent frequency and functional relevance of PTM crosstalk, in this update, the PTM sites neighboring other PTM sites in a specified window length were subjected to motif discovery and functional enrichment analysis. This update highlights the current challenges in PTM crosstalk investigation and breaks the bottleneck of how proteomics may contribute to understanding PTM codes, revealing the next level of data complexity and proteomic limitation in prospective PTM research.


Subject(s)
Databases, Protein , Protein Processing, Post-Translational , Amino Acid Motifs , Computational Biology , Genome-Wide Association Study , Glycosylation , High-Throughput Nucleotide Sequencing , Humans , Mass Spectrometry/methods , Phosphorylation , Polymorphism, Single Nucleotide , Proteomics/methods , Structure-Activity Relationship , Substrate Specificity , User-Computer Interface
10.
PLoS One ; 12(6): e0179529, 2017.
Article in English | MEDLINE | ID: mdl-28662047

ABSTRACT

S-palmitoylation, the covalent attachment of 16-carbon palmitic acids to a cysteine residue via a thioester linkage, is an important reversible lipid modification that plays a regulatory role in a variety of physiological and biological processes. As the number of experimentally identified S-palmitoylated peptides increases, it is imperative to investigate substrate motifs to facilitate the study of protein S-palmitoylation. Based on 710 non-homologous S-palmitoylation sites obtained from published databases and the literature, we carried out a bioinformatics investigation of S-palmitoylation sites based on amino acid composition. Two Sample Logo indicates that positively charged and polar amino acids surrounding S-palmitoylated sites may be associated with the substrate site specificity of protein S-palmitoylation. Additionally, maximal dependence decomposition (MDD) was applied to explore the motif signatures of S-palmitoylation sites by categorizing a large-scale dataset into subgroups with statistically significant conservation of amino acids. Single features such as amino acid composition (AAC), amino acid pair composition (AAPC), position specific scoring matrix (PSSM), position weight matrix (PWM), amino acid substitution matrix (BLOSUM62), and accessible surface area (ASA) were considered, along with the effectiveness of incorporating MDD-identified substrate motifs into a two-layered prediction model. Evaluation by five-fold cross-validation showed that a hybrid of AAC and PSSM performs best at discriminating between S-palmitoylation and non-S-palmitoylation sites, according to the support vector machine (SVM). The two-layered SVM model integrating MDD-identified substrate motifs performed well, with a sensitivity of 0.79, specificity of 0.80, accuracy of 0.80, and Matthews Correlation Coefficient (MCC) value of 0.45. Using an independent testing dataset (613 S-palmitoylated and 5412 non-S-palmitoylated sites) obtained from the literature, we demonstrated that the two-layered SVM model could outperform other prediction tools, yielding a balanced sensitivity and specificity of 0.690 and 0.694, respectively. This two-layered SVM model has been implemented as a web-based system (MDD-Palm), which is now freely available at http://csb.cse.yzu.edu.tw/MDDPalm/.


Subject(s)
Palmitic Acid/metabolism , Protein S/metabolism , Amino Acids/metabolism , Substrate Specificity
11.
BMC Bioinformatics ; 18(Suppl 3): 66, 2017 Mar 14.
Article in English | MEDLINE | ID: mdl-28361707

ABSTRACT

BACKGROUND: Protein carbonylation, an irreversible and non-enzymatic post-translational modification (PTM), is often used as a marker of oxidative stress. When reactive oxygen species (ROS) oxidized the amino acid side chains, carbonyl (CO) groups are produced especially on Lysine (K), Arginine (R), Threonine (T), and Proline (P). Nevertheless, due to the lack of information about the carbonylated substrate specificity, we were encouraged to develop a systematic method for a comprehensive investigation of protein carbonylation sites. RESULTS: After the removal of redundant data from multipe carbonylation-related articles, totally 226 carbonylated proteins in human are regarded as training dataset, which consisted of 307, 126, 128, and 129 carbonylation sites for K, R, T and P residues, respectively. To identify the useful features in predicting carbonylation sites, the linear amino acid sequence was adopted not only to build up the predictive model from training dataset, but also to compare the effectiveness of prediction with other types of features including amino acid composition (AAC), amino acid pair composition (AAPC), position-specific scoring matrix (PSSM), positional weighted matrix (PWM), solvent-accessible surface area (ASA), and physicochemical properties. The investigation of position-specific amino acid composition revealed that the positively charged amino acids (K and R) are remarkably enriched surrounding the carbonylated sites, which may play a functional role in discriminating between carbonylation and non-carbonylation sites. A variety of predictive models were built using various features and three different machine learning methods. Based on the evaluation by five-fold cross-validation, the models trained with PWM feature could provide better sensitivity in the positive training dataset, while the models trained with AAindex feature achieved higher specificity in the negative training dataset. Additionally, the model trained using hybrid features, including PWM, AAC and AAindex, obtained best MCC values of 0.432, 0.472, 0.443 and 0.467 on K, R, T and P residues, respectively. CONCLUSION: When comparing to an existing prediction tool, the selected models trained with hybrid features provided a promising accuracy on an independent testing dataset. In short, this work not only characterized the carbonylated substrate preference, but also demonstrated that the proposed method could provide a feasible means for accelerating preliminary discovery of protein carbonylation.


Subject(s)
Amino Acids/chemistry , Chemical Phenomena , Protein Carbonylation , Amino Acid Sequence , Arginine/chemistry , Humans , Lysine/chemistry , Models, Theoretical , Position-Specific Scoring Matrices , Proline/chemistry , Protein Processing, Post-Translational , Proteins/chemistry , Reactive Oxygen Species/chemistry , Substrate Specificity , Threonine/chemistry
12.
BMC Syst Biol ; 11(Suppl 7): 137, 2017 12 21.
Article in English | MEDLINE | ID: mdl-29322938

ABSTRACT

BACKGROUND: Carbonylation, which takes place through oxidation of reactive oxygen species (ROS) on specific residues, is an irreversibly oxidative modification of proteins. It has been reported that the carbonylation is related to a number of metabolic or aging diseases including diabetes, chronic lung disease, Parkinson's disease, and Alzheimer's disease. Due to the lack of computational methods dedicated to exploring motif signatures of protein carbonylation sites, we were motivated to exploit an iterative statistical method to characterize and identify carbonylated sites with motif signatures. RESULTS: By manually curating experimental data from research articles, we obtained 332, 144, 135, and 140 verified substrate sites for K (lysine), R (arginine), T (threonine), and P (proline) residues, respectively, from 241 carbonylated proteins. In order to examine the informative attributes for classifying between carbonylated and non-carbonylated sites, multifarious features including composition of twenty amino acids (AAC), composition of amino acid pairs (AAPC), position-specific scoring matrix (PSSM), and positional weighted matrix (PWM) were investigated in this study. Additionally, in an attempt to explore the motif signatures of carbonylation sites, an iterative statistical method was adopted to detect statistically significant dependencies of amino acid compositions between specific positions around substrate sites. Profile hidden Markov model (HMM) was then utilized to train a predictive model from each motif signature. Moreover, based on the method of support vector machine (SVM), we adopted it to construct an integrative model by combining the values of bit scores obtained from profile HMMs. The combinatorial model could provide an enhanced performance with evenly predictive sensitivity and specificity in the evaluation of cross-validation and independent testing. CONCLUSION: This study provides a new scheme for exploring potential motif signatures at substrate sites of protein carbonylation. The usefulness of the revealed motifs in the identification of carbonylated sites is demonstrated by their effective performance in cross-validation and independent testing. Finally, these substrate motifs were adopted to build an available online resource (MDD-Carb, http://csb.cse.yzu.edu.tw/MDDCarb/ ) and are also anticipated to facilitate the study of large-scale carbonylated proteomes.


Subject(s)
Models, Molecular , Protein Carbonylation , Proteins/chemistry , Proteins/metabolism , Amino Acid Motifs , Amino Acid Sequence , Binding Sites , Internet
13.
BMC Syst Biol ; 10 Suppl 1: 6, 2016 Jan 11.
Article in English | MEDLINE | ID: mdl-26818456

ABSTRACT

BACKGROUND: The conjugation of ubiquitin to a substrate protein (protein ubiquitylation), which involves a sequential process--E1 activation, E2 conjugation and E3 ligation, is crucial to the regulation of protein function and activity in eukaryotes. This ubiquitin-conjugation process typically binds the last amino acid of ubiquitin (glycine 76) to a lysine residue of a target protein. The high-throughput of mass spectrometry-based proteomics has stimulated a large-scale identification of ubiquitin-conjugated peptides. Hence, a new web resource, UbiSite, was developed to identify ubiquitin-conjugation site on lysines based on large-scale proteome dataset. RESULTS: Given a total of 37,647 ubiquitin-conjugated proteins, including 128,026 ubiquitylated peptides, obtained from various resources, this study carries out a large-scale investigation on ubiquitin-conjugation sites based on sequenced and structural characteristics. A TwoSampleLogo reveals that a significant depletion of histidine (H), arginine (R) and cysteine (C) residues around ubiquitylation sites may impact the conjugation of ubiquitins in closed three-dimensional environments. Based on the large-scale ubiquitylation dataset, a motif discovery tool, MDDLogo, has been adopted to characterize the potential substrate motifs for ubiquitin conjugation. Not only are single features such as amino acid composition (AAC), positional weighted matrix (PWM), position-specific scoring matrix (PSSM) and solvent-accessible surface area (SASA) considered, but also the effectiveness of incorporating MDDLogo-identified substrate motifs into a two-layered prediction model is taken into account. Evaluation by five-fold cross-validation showed that PSSM is the best feature in discriminating between ubiquitylation and non-ubiquitylation sites, based on support vector machine (SVM). Additionally, the two-layered SVM model integrating MDDLogo-identified substrate motifs could obtain a promising accuracy and the Matthews Correlation Coefficient (MCC) at 81.06% and 0.586, respectively. Furthermore, the independent testing showed that the two-layered SVM model could outperform other prediction tools, reaching at 85.10% sensitivity, 69.69% specificity, 73.69% accuracy and the 0.483 of MCC value. CONCLUSION: The independent testing result indicated the effectiveness of incorporating MDDLogo-identified motifs into the prediction of ubiquitylation sites. In order to provide meaningful assistance to researchers interested in large-scale ubiquitinome data, the two-layered SVM model has been implemented onto a web-based system (UbiSite), which is freely available at http://csb.cse.yzu.edu.tw/UbiSite/ . Two cases given in the UbiSite provide a demonstration of effective identification of ubiquitylation sites with reference to substrate motifs.


Subject(s)
Amino Acid Motifs , Lysine/chemistry , Machine Learning , Ubiquitin/chemistry , Amino Acid Sequence , Datasets as Topic , Mass Spectrometry , Models, Molecular , Protein Domains , Proteome , Proteomics/methods , Sequence Analysis, Protein , Software , Ubiquitination
14.
Nucleic Acids Res ; 44(D1): D435-46, 2016 Jan 04.
Article in English | MEDLINE | ID: mdl-26578568

ABSTRACT

Owing to the importance of the post-translational modifications (PTMs) of proteins in regulating biological processes, the dbPTM (http://dbPTM.mbc.nctu.edu.tw/) was developed as a comprehensive database of experimentally verified PTMs from several databases with annotations of potential PTMs for all UniProtKB protein entries. For this 10th anniversary of dbPTM, the updated resource provides not only a comprehensive dataset of experimentally verified PTMs, supported by the literature, but also an integrative interface for accessing all available databases and tools that are associated with PTM analysis. As well as collecting experimental PTM data from 14 public databases, this update manually curates over 12 000 modified peptides, including the emerging S-nitrosylation, S-glutathionylation and succinylation, from approximately 500 research articles, which were retrieved by text mining. As the number of available PTM prediction methods increases, this work compiles a non-homologous benchmark dataset to evaluate the predictive power of online PTM prediction tools. An increasing interest in the structural investigation of PTM substrate sites motivated the mapping of all experimental PTM peptides to protein entries of Protein Data Bank (PDB) based on database identifier and sequence identity, which enables users to examine spatially neighboring amino acids, solvent-accessible surface area and side-chain orientations for PTM substrate sites on tertiary structures. Since drug binding in PDB is annotated, this update identified over 1100 PTM sites that are associated with drug binding. The update also integrates metabolic pathways and protein-protein interactions to support the PTM network analysis for a group of proteins. Finally, the web interface is redesigned and enhanced to facilitate access to this resource.


Subject(s)
Databases, Protein , Protein Processing, Post-Translational , Binding Sites , Disease , Glycosylation , Metabolic Networks and Pathways , Pharmaceutical Preparations/chemistry , Protein Conformation , Protein Interaction Mapping
15.
BMC Bioinformatics ; 16 Suppl 18: S10, 2015.
Article in English | MEDLINE | ID: mdl-26680539

ABSTRACT

Protein O-GlcNAcylation, involving the ß-attachment of single N-acetylglucosamine (GlcNAc) to the hydroxyl group of serine or threonine residues, is an O-linked glycosylation catalyzed by O-GlcNAc transferase (OGT). Molecular level investigation of the basis for OGT's substrate specificity should aid understanding how O-GlcNAc contributes to diverse cellular processes. Due to an increasing number of O-GlcNAcylated peptides with site-specific information identified by mass spectrometry (MS)-based proteomics, we were motivated to characterize substrate site motifs of O-GlcNAc transferases. In this investigation, a non-redundant dataset of 410 experimentally verified O-GlcNAcylation sites were manually extracted from dbOGAP, OGlycBase and UniProtKB. After detection of conserved motifs by using maximal dependence decomposition, profile hidden Markov model (profile HMM) was adopted to learn a first-layered model for each identified OGT substrate motif. Support Vector Machine (SVM) was then used to generate a second-layered model learned from the output values of profile HMMs in first layer. The two-layered predictive model was evaluated using a five-fold cross validation which yielded a sensitivity of 85.4%, a specificity of 84.1%, and an accuracy of 84.7%. Additionally, an independent testing set from PhosphoSitePlus, which was really non-homologous to the training data of predictive model, was used to demonstrate that the proposed method could provide a promising accuracy (84.05%) and outperform other O-GlcNAcylation site prediction tools. A case study indicated that the proposed method could be a feasible means of conducting preliminary analyses of protein O-GlcNAcylation and has been implemented as a web-based system, OGTSite, which is now freely available at http://csb.cse.yzu.edu.tw/OGTSite/.


Subject(s)
Machine Learning , N-Acetylglucosaminyltransferases/metabolism , Proteins/chemistry , Acetylglucosamine/metabolism , Algorithms , Amino Acid Motifs , Glycosylation , Internet , Mass Spectrometry , Peptides/analysis , Peptides/metabolism , Proteins/metabolism , Substrate Specificity , Support Vector Machine , User-Computer Interface
16.
BMC Bioinformatics ; 15 Suppl 16: S1, 2014.
Article in English | MEDLINE | ID: mdl-25521204

ABSTRACT

BACKGROUND: Protein O-GlcNAcylation, involving the attachment of single N-acetylglucosamine (GlcNAc) to the hydroxyl group of serine or threonine residues. Elucidation of O-GlcNAcylation sites on proteins is required in order to decipher its crucial roles in regulating cellular processes and aid in drug design. With an increasing number of O-GlcNAcylation sites identified by mass spectrometry (MS)-based proteomics, several methods have been proposed for the computational identification of O-GlcNAcylation sites. However, no development that focuses on the investigation of O-GlcNAcylated substrate motifs has existed. Thus, we were motivated to design a new method for the identification of protein O-GlcNAcylation sites with the consideration of substrate site specificity. RESULTS: In this study, 375 experimentally verified O-GlcNAcylation sites were collected from dbOGAP, which is an integrated resource for protein O-GlcNAcylation. Due to the difficulty in characterizing the substrate motifs by conventional sequence logo analysis, a recursively statistical method has been applied to obtain significant conserved motifs. To construct the predictive models learned from the identified substrate motifs, we adopted Support Vector Machines (SVMs). A five-fold cross validation was used to evaluate the predictive model, achieving sensitivity, specificity, and accuracy of 0.76, 0.80, and 0.78, respectively. Additionally, an independent testing set, which was really blind to the training data of predictive model, was used to demonstrate that the proposed method could provide a promising accuracy (0.94) and outperform three other O-GlcNAcylation site prediction tools. CONCLUSION: This work proposed a computational method to identify informative substrate motifs for O-GlcNAcylation sites. The evaluation of cross validation and independent testing indicated that the identified motifs were effective in the identification of O-GlcNAcylation sites. A case study demonstrated that the proposed method could be a feasible means of conducting preliminary analyses of protein O-GlcNAcylation. We also anticipated that the revealed substrate motif may facilitate the study of extensive crosstalk between O-GlcNAcylation and phosphorylation. This method may help unravel their mechanisms and roles in signaling, transcription, chronic disease, and cancer.


Subject(s)
Acetylglucosamine/chemistry , Acetylglucosamine/metabolism , Computational Biology/methods , Protein Processing, Post-Translational , Proteins/chemistry , Proteins/metabolism , Amino Acid Motifs , Glycosylation , Humans , Mass Spectrometry , Models, Molecular , Phosphorylation , Proteomics , Signal Transduction , Substrate Specificity , Support Vector Machine
18.
Nucleic Acids Res ; 42(Database issue): D537-45, 2014 Jan.
Article in English | MEDLINE | ID: mdl-24302577

ABSTRACT

Transmembrane (TM) proteins have crucial roles in various cellular processes. The location of post-translational modifications (PTMs) on TM proteins is associated with their functional roles in various cellular processes. Given the importance of PTMs in the functioning of TM proteins, this study developed topPTM (available online at http://topPTM.cse.yzu.edu.tw), a new dbPTM module that provides a public resource for identifying the functional PTM sites on TM proteins with structural topology. Experimentally verified TM topology data were integrated from TMPad, TOPDB, PDBTM and OPM. In addition to the PTMs obtained from dbPTM, experimentally verified PTM sites were manually extracted from research articles by text mining. In an attempt to provide a full investigation of PTM sites on TM proteins, all UniProtKB protein entries containing annotations related to membrane localization and TM topology were considered potential TM proteins. Two effective tools were then used to annotate the structural topology of the potential TM proteins. The TM topology of TM proteins is represented by graphical visualization, as well as by the PTM sites. To delineate the structural correlation between the PTM sites and TM topologies, the tertiary structure of PTM sites on TM proteins was visualized by Jmol program. Given the support of research articles by manual curation and the investigation of domain-domain interactions in Protein Data Bank, 1347 PTM substrate sites are associated with protein-protein interactions for 773 TM proteins. The database content is regularly updated on publication of new data by continuous surveys of research articles and available resources.


Subject(s)
Databases, Protein , Membrane Proteins/metabolism , Protein Processing, Post-Translational , Internet , Membrane Proteins/chemistry , Protein Structure, Tertiary
19.
Acta Neurol Taiwan ; 14(4): 187-90, 2005 Dec.
Article in English | MEDLINE | ID: mdl-16425545

ABSTRACT

Depression is a frequent and important problem for patients who have experienced strokes. The purpose of this study was to assess the prevalence of depressive symptoms, their clinical correlations, and the effects of depressive symptoms on stroke recovery. A consecutive cohort of 207 ischemic stroke patients with a mean age of 64 years, were studied for ascertaining any correlation between potential risk factors and the incidence of post-stroke depression (PSD). Depressive symptoms were relatively common (34.3% Hamilton depression rating scale > 10), but the prevalence of severe depression (HDRS > 17) was only 7.7%. Patients with depressive symptoms were more likely to be female, have a family history of depression, and a poor functional outcome. There were no significant differences between depressive symptoms and age, marital status, location of stroke lesion, and duration after stroke onset. Our findings indicate that depressive symptoms occurred in about one third of post stroke patients. There is a negative correlation between depressive symptoms and functional status of the patients.


Subject(s)
Brain Ischemia/psychology , Depression/epidemiology , Stroke/psychology , Activities of Daily Living , Adult , Aged , Aged, 80 and over , Female , Humans , Male , Middle Aged , Prevalence , Regression Analysis
SELECTION OF CITATIONS
SEARCH DETAIL
...